Efficient clustered server-side data analysis workflows using SWAMP

نویسندگان

  • Daniel L. Wang
  • Charles S. Zender
  • Stephen F. Jenks
چکیده

Technology continues to enable scientists to set new records in data collection and production, intensifying a need for large scale tools to efficiently process and analyze the growing mountain of data. To complement growth in the number of data centers and the volume of data they store, we introduce our Script Workflow Analysis for MultiProcessing (SWAMP) system. Our system provides safe server-side processing capabilities that allow scientists to reuse familiar desktop-based analysis methods represented in shellscripts. Built-in script compilation isolates file accesses and generates workflows, while a cluster-capable execution engine partitions and executes the resulting workflow. Benchmarks illustrate up to 20X performance gains, as well as the importance of I/O considerations which make other computation systems less effective at geoscience data reduction. Communicated by: H.A. Babaie This work was supported by the National Science Foundation under grant IIS-0431203. D. L. Wang (B) SLAC National Accelerator Laboratory, 2575 Sand Hill Road, M/S 97, Menlo Park, CA, USA e-mail: [email protected] C. S. Zender Dept. of Earth System Science, University of California, Irvine, Irvine, CA, USA D. L. Wang · S. F. Jenks Dept. of Elec. Engn. and Comp. Sci., University of California, Irvine, Irvine, CA, USA

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Server-Side Parallel Data Reduction and Analysis

Geoscience analysis is currently limited by cumbersome access and manipulation of large datasets from remote sources. Due to their data-heavy and compute-light nature, these analysis workloads represent a class of applications unsuited to a computational grid optimized for compute-intensive applications. We present the Script Workflow Analysis for MultiProcessing (SWAMP) system, which relocates...

متن کامل

Investigation on Reliability Estimation of Loosely Coupled Software as a Service Execution Using Clustered and Non-Clustered Web Server

Evaluating the reliability of loosely coupled Software as a Service through the paradigm of a cluster-based and non-cluster-based web server is considered to be an important attribute for the service delivery and execution. We proposed a novel method for measuring the reliability of Software as a Service execution through load testing. The fault count of the model against the stresses of users ...

متن کامل

Efficient, Distributed and Interactive Neuroimaging Data Analysis Using the LONI Pipeline

The LONI Pipeline is a graphical environment for construction, validation and execution of advanced neuroimaging data analysis protocols (Rex et al., 2003). It enables automated data format conversion, allows Grid utilization, facilitates data provenance, and provides a significant library of computational tools. There are two main advantages of the LONI Pipeline over other graphical analysis w...

متن کامل

Workflows Hosted In Portals

The WHIP (Workflows Hosted In Portals) project is building a set of software plug-ins to enable interactive, user driven, workflow integration, sharing, and collaboration within Web portals. This software architecture enables the dynamic publishing of workflows, facilitating information exchange and scientific collaboration. The WHIP plug-in provides functionality to perform composition, editin...

متن کامل

On Transaction Design for UML Components

The transaction concept enables the efficient development of concurrent and fault tolerant applications. Transaction services are therefore an essential part of modern component technologies, such as Enterprise JavaBeans, which are used to develop server-side business applications. The container, which is the execution environment of component-based applications, provides the services and uses ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Earth Science Informatics

دوره 2  شماره 

صفحات  -

تاریخ انتشار 2009